IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] deep learning(167hit)

81-100hit(167hit)

Triplet Attention Network for Video-Based Person Re-Identification
Rui SUN Qili LIANG Zi YANG Zhenghui ZHAO Xudong ZHANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2021/07/21
Vol:
E104-D No:10
Page(s):
1775-1779
Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets
Yung-Hui LI Muhammad Saqlain ASLAM Latifa Nabila HARFIYA Ching-Chun CHANG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2021/06/01
Vol:
E104-D No:9
Page(s):
1450-1458
The recent development of deep learning-based generative models has sharply intensified the interest in data synthesis and its applications. Data synthesis takes on an added importance especially for some pattern recognition tasks in which some classes of data are rare and difficult to collect. In an iris dataset, for instance, the minority class samples include images of eyes with glasses, oversized or undersized pupils, misaligned iris locations, and iris occluded or contaminated by eyelids, eyelashes, or lighting reflections. Such class-imbalanced datasets often result in biased classification performance. Generative adversarial networks (GANs) are one of the most promising frameworks that learn to generate synthetic data through a two-player minimax game between a generator and a discriminator. In this paper, we utilized the state-of-the-art conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for generating the minority class of iris images which saves huge amount of cost of human labors for rare data collection. With our model, the researcher can generate as many iris images of rare cases as they want and it helps to develop any deep learning algorithm whenever large size of dataset is needed.
Capsule Network with Shortcut Routing Open Access
Thanh Vu DANG Hoang Trong VO Gwang Hyun YU Jin Young KIM

PAPER-Image

Pubricized:
2021/01/27
Vol:
E104-A No:8
Page(s):
1043-1050
Capsules are fundamental informative units that are introduced into capsule networks to manipulate the hierarchical presentation of patterns. The part-hole relationship of an entity is learned through capsule layers, using a routing-by-agreement mechanism that is approximated by a voting procedure. Nevertheless, existing routing methods are computationally inefficient. We address this issue by proposing a novel routing mechanism, namely “shortcut routing”, that directly learns to activate global capsules from local capsules. In our method, the number of operations in the routing procedure is reduced by omitting the capsules in intermediate layers, resulting in lighter routing. To further address the computational problem, we investigate an attention-based approach, and propose fuzzy coefficients, which have been found to be efficient than mixture coefficients from EM routing. Our method achieves on-par classification results on the Mnist (99.52%), smallnorb (93.91%), and affNist (89.02%) datasets. Compared to EM routing, our fuzzy-based and attention-based routing methods attain reductions of 1.42 and 2.5 in terms of the number of calculations.
Video Inpainting by Frame Alignment with Deformable Convolution
Yusuke HARA Xueting WANG Toshihiko YAMASAKI

PAPER-Image Processing and Video Processing

Pubricized:
2021/04/22
Vol:
E104-D No:8
Page(s):
1349-1358
Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.
Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale
Thao-Nguyen TRUONG Ryousei TAKANO

PAPER-Information Network

Pubricized:
2021/04/23
Vol:
E104-D No:8
Page(s):
1332-1339
Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.
An Efficient Deep Learning Based Coarse-to-Fine Cephalometric Landmark Detection Method
Yu SONG Xu QIAO Yutaro IWAMOTO Yen-Wei CHEN Yili CHEN

PAPER-Image Processing and Video Processing

Pubricized:
2021/05/14
Vol:
E104-D No:8
Page(s):
1359-1366
Accurate and automatic quantitative cephalometry analysis is of great importance in orthodontics. The fundamental step for cephalometry analysis is to annotate anatomic-interested landmarks on X-ray images. Computer-aided automatic method remains to be an open topic nowadays. In this paper, we propose an efficient deep learning-based coarse-to-fine approach to realize accurate landmark detection. In the coarse detection step, we train a deep learning-based deformable transformation model by using training samples. We register test images to the reference image (one training image) using the trained model to predict coarse landmarks' locations on test images. Thus, regions of interest (ROIs) which include landmarks can be located. In the fine detection step, we utilize trained deep convolutional neural networks (CNNs), to detect landmarks in ROI patches. For each landmark, there is one corresponding neural network, which directly does regression to the landmark's coordinates. The fine step can be considered as a refinement or fine-tuning step based on the coarse detection step. We validated the proposed method on public dataset from 2015 International Symposium on Biomedical Imaging (ISBI) grand challenge. Compared with the state-of-the-art method, we not only achieved the comparable detection accuracy (the mean radial error is about 1.0-1.6mm), but also largely shortened the computation time (4 seconds per image).
CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition
Pengtao JIA Qi ZHAO Boze LI Jing ZHANG

PAPER

Pubricized:
2021/04/28
Vol:
E104-D No:8
Page(s):
1239-1249
Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.
Secret Key Generation Scheme Based on Deep Learning in FDD MIMO Systems
Zheng WAN Kaizhi HUANG Lu CHEN

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2021/04/07
Vol:
E104-D No:7
Page(s):
1058-1062
In this paper, a deep learning-based secret key generation scheme is proposed for FDD multiple-input and multiple-output (MIMO) systems. We built an encoder-decoder based convolutional neural network to characterize the wireless environment to learn the mapping relationship between the uplink and downlink channel. The designed neural network can accurately predict the downlink channel state information based on the estimated uplink channel state information without any information feedback. Random secret keys can be generated from downlink channel responses predicted by the neural network. Simulation results show that deep learning based SKG scheme can achieve significant performance improvement in terms of the key agreement ratio and achievable secret key rate.
Multi-View Texture Learning for Face Super-Resolution
Yu WANG Tao LU Feng YAO Yuntao WU Yanduo ZHANG

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/03/24
Vol:
E104-D No:7
Page(s):
1028-1038
In recent years, single face image super-resolution (SR) using deep neural networks have been well developed. However, most of the face images captured by the camera in a real scene are from different views of the same person, and the existing traditional multi-frame image SR requires alignment between images. Due to multi-view face images contain texture information from different views, which can be used as effective prior information, how to use this prior information from multi-views to reconstruct frontal face images is challenging. In order to effectively solve the above problems, we propose a novel face SR network based on multi-view face images, which focus on obtaining more texture information from multi-view face images to help the reconstruction of frontal face images. And in this network, we also propose a texture attention mechanism to transfer high-precision texture compensation information to the frontal face image to obtain better visual effects. We conduct subjective and objective evaluations, and the experimental results show the great potential of using multi-view face images SR. The comparison with other state-of-the-art deep learning SR methods proves that the proposed method has excellent performance.
Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation
Yong HE Ji LI Xuanhong ZHOU Zewei CHEN Xin LIU

PAPER-Image Recognition, Computer Vision

Pubricized:
2021/03/26
Vol:
E104-D No:7
Page(s):
1039-1048
6DoF pose estimation from a monocular RGB image is a challenging but fundamental task. The methods based on unit direction vector-field representation and Hough voting strategy achieved state-of-the-art performance. Nevertheless, they apply the smooth l1 loss to learn the two elements of the unit vector separately, resulting in which is not taken into account that the prior distance between the pixel and the keypoint. While the positioning error is significantly affected by the prior distance. In this work, we propose a Prior Distance Augmented Loss (PDAL) to exploit the prior distance for more accurate vector-field representation. Furthermore, we propose a lightweight channel-level attention module for adaptive feature fusion. Embedding this Adaptive Fusion Attention Module (AFAM) into the U-Net, we build an Attention Voting Network to further improve the performance of our method. We conduct extensive experiments to demonstrate the effectiveness and performance improvement of our methods on the LINEMOD, OCCLUSION and YCB-Video datasets. Our experiments show that the proposed methods bring significant performance gains and outperform state-of-the-art RGB-based methods without any post-refinement.
Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization
Koichi SHIRAHATA Amir HADERBACHE Naoto FUKUMOTO Kohta NAKASHIMA

BRIEF PAPER

Pubricized:
2020/12/01
Vol:
E104-C No:6
Page(s):
257-260
Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.
Differentially Private Neural Networks with Bounded Activation Function
Kijung JUNG Hyukki LEE Yon Dohn CHUNG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2021/03/18
Vol:
E104-D No:6
Page(s):
905-908
Deep learning has shown outstanding performance in various fields, and it is increasingly deployed in privacy-critical domains. If sensitive data in the deep learning model are exposed, it can cause serious privacy threats. To protect individual privacy, we propose a novel activation function and stochastic gradient descent for applying differential privacy to deep learning. Through experiments, we show that the proposed method can effectively protect the privacy and the performance of proposed method is better than the previous approaches.
MTGAN: Extending Test Case set for Deep Learning Image Classifier
Erhu LIU Song HUANG Cheng ZONG Changyou ZHENG Yongming YAO Jing ZHU Shiqi TANG Yanqiu WANG

PAPER-Software Engineering

Pubricized:
2021/02/05
Vol:
E104-D No:5
Page(s):
709-722
During the recent several years, deep learning has achieved excellent results in image recognition, voice processing, and other research areas, which has set off a new upsurge of research and application. Internal defects and external malicious attacks may threaten the safe and reliable operation of a deep learning system and even cause unbearable consequences. The technology of testing deep learning systems is still in its infancy. Traditional software testing technology is not applicable to test deep learning systems. In addition, the characteristics of deep learning such as complex application scenarios, the high dimensionality of input data, and poor interpretability of operation logic bring new challenges to the testing work. This paper focuses on the problem of test case generation and points out that adversarial examples can be used as test cases. Then the paper proposes MTGAN which is a framework to generate test cases for deep learning image classifiers based on Generative Adversarial Network. Finally, this paper evaluates the effectiveness of MTGAN.
HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage
Chaoran ZHOU Jianping ZHAO Tai MA Xin ZHOU

PAPER

Pubricized:
2021/02/25
Vol:
E104-D No:5
Page(s):
659-668
In Internet applications, when users search for information, the search engines invariably return some invalid webpages that do not contain valid information. These invalid webpages interfere with the users' access to useful information, affect the efficiency of users' information query and occupy Internet resources. Accurate and fast filtering of invalid webpages can purify the Internet environment and provide convenience for netizens. This paper proposes an invalid webpage filtering model (HAIF) based on deep learning and hierarchical attention mechanism. HAIF improves the semantic and sequence information representation of webpage text by concatenating lexical-level embeddings and paragraph-level embeddings. HAIF introduces hierarchical attention mechanism to optimize the extraction of text sequence features and webpage tag features. Among them, the local-level attention layer optimizes the local information in the plain text. By concatenating the input embeddings and the feature matrix after local-level attention calculation, it enriches the representation of information. The tag-level attention layer introduces webpage structural feature information on the attention calculation of different HTML tags, so that HAIF is better applicable to the Internet resource field. In order to evaluate the effectiveness of HAIF in filtering invalid pages, we conducted various experiments. Experimental results demonstrate that, compared with other baseline models, HAIF has improved to various degrees on various evaluation criteria.
Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud
Chikako TAKASAKI Atsuko TAKEFUSA Hidemoto NAKADA Masato OGUCHI

PAPER

Pubricized:
2021/02/02
Vol:
E104-D No:5
Page(s):
539-550
With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.
Deep Network for Parametric Bilinear Generalized Approximate Message Passing and Its Application in Compressive Sensing under Matrix Uncertainty
Jingjing SI Wenwen SUN Chuang LI Yinbo CHENG

LETTER-Digital Signal Processing

Pubricized:
2020/09/29
Vol:
E104-A No:4
Page(s):
751-756
Deep learning is playing an increasingly important role in signal processing field due to its excellent performance on many inference problems. Parametric bilinear generalized approximate message passing (P-BiG-AMP) is a new approximate message passing based approach to a general class of structure-matrix bilinear estimation problems. In this letter, we propose a novel feed-forward neural network architecture to realize P-BiG-AMP methodology with deep learning for the inference problem of compressive sensing under matrix uncertainty. Linear transforms utilized in the recovery process and parameters involved in the input and output channels of measurement are jointly learned from training data. Simulation results show that the trained P-BiG-AMP network can achieve higher reconstruction performance than the P-BiG-AMP algorithm with parameters tuned via the expectation-maximization method.
Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification
Yih-Cherng LEE Hung-Wei HSU Jian-Jiun DING Wen HOU Lien-Shiang CHOU Ronald Y. CHANG

PAPER-Image

Pubricized:
2020/09/29
Vol:
E104-A No:4
Page(s):
734-743
Automatic tracking and classification are essential for studying the behaviors of wild animals. Owing to dynamic far-shooting photos, the occlusion problem, protective coloration, the background noise is irregular interference for designing a computerized algorithm for reducing human labeling resources. Moreover, wild dolphin images are hard-acquired by on-the-spot investigations, which takes a lot of waiting time and hardly sets the fixed camera to automatic monitoring dolphins on the ocean in several days. It is challenging tasks to detect well and classify a dolphin from polluted photos by a single famous deep learning method in a small dataset. Therefore, in this study, we propose a generic Cascade Small Object Detection (CSOD) algorithm for dolphin detection to handle small object problems and develop visualization to backbone based classification (V2BC) for removing noise, highlighting features of dolphin and classifying the name of dolphin. The architecture of CSOD consists of the P-net and the F-net. The P-net uses the crude Yolov3 detector to be a core network to predict all the regions of interest (ROIs) at lower resolution images. Then, the F-net, which is more robust, is applied to capture the ROIs from high-resolution photos to solve single detector problems. Moreover, a visualization to backbone based classification (V2BC) method focuses on extracting significant regions of occluded dolphin and design significant post-processing by referencing the backbone of dolphins to facilitate for classification. Compared to the state of the art methods, including faster-rcnn, yolov3 detection and Alexnet, the Vgg, and the Resnet classification. All experiments show that the proposed algorithm based on CSOD and V2BC has an excellent performance in dolphin detection and classification. Consequently, compared to the related works of classification, the accuracy of the proposed designation is over 14% higher. Moreover, our proposed CSOD detection system has 42% higher performance than that of the original Yolov3 architecture.
Robustness of Deep Learning Models in Dermatological Evaluation: A Critical Assessment
Sourav MISHRA Subhajit CHAUDHURY Hideaki IMAIZUMI Toshihiko YAMASAKI

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2020/12/22
Vol:
E104-D No:3
Page(s):
419-429
Our paper attempts to critically assess the robustness of deep learning methods in dermatological evaluation. Although deep learning is being increasingly sought as a means to improve dermatological diagnostics, the performance of models and methods have been rarely investigated beyond studies done under ideal settings. We aim to look beyond results obtained on curated and ideal data corpus, by investigating resilience and performance on user-submitted data. Assessing via few imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.
Benchmarking Modern Edge Devices for AI Applications
Pilsung KANG Jongmin JO

PAPER-Computer System

Pubricized:
2020/12/08
Vol:
E104-D No:3
Page(s):
394-403
AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.
Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning
Kazuya URAZOE Nobutaka KUROKI Yu KATO Shinya OHTANI Tetsuya HIROSE Masahiro NUMA

PAPER-Image Processing and Video Processing

Pubricized:
2020/10/02
Vol:
E104-D No:1
Page(s):
183-193
This paper presents an image super-resolution technique using a convolutional neural network (CNN) and multi-task learning for multiple image categories. The image categories include natural, manga, and text images. Their features differ from each other. However, several CNNs for super-resolution are trained with a single category. If the input image category is different from that of the training images, the performance of super-resolution is degraded. There are two possible solutions to manage multi-categories with conventional CNNs. The first involves the preparation of the CNNs for every category. This solution, however, requires a category classifier to select an appropriate CNN. The second is to learn all categories with a single CNN. In this solution, the CNN cannot optimize its internal behavior for each category. Therefore, this paper presents a super-resolution CNN architecture for multiple image categories. The proposed CNN has two parallel outputs for a high-resolution image and a category label. The main CNN for the high-resolution image is a normal three convolutional layer-architecture, and the sub neural network for the category label is branched out from its middle layer and consists of two fully-connected layers. This architecture can simultaneously learn the high-resolution image and its category using multi-task learning. The category information is used for optimizing the super-resolution. In an applied setting, the proposed CNN can automatically estimate the input image category and change the internal behavior. Experimental results of 2× image magnification have shown that the average peak signal-to-noise ratio for the proposed method is approximately 0.22 dB higher than that for the conventional super-resolution with no difference in processing time and parameters. We have ensured that the proposed method is useful when the input image category is varying.

81-100hit(167hit)

Keyword Search Result

[Keyword] deep learning(167hit)

Triplet Attention Network for Video-Based Person Re-Identification

Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets

Capsule Network with Shortcut Routing Open Access

Video Inpainting by Frame Alignment with Deformable Convolution

Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

An Efficient Deep Learning Based Coarse-to-Fine Cephalometric Landmark Detection Method

CJAM: Convolutional Neural Network Joint Attention Mechanism in Gait Recognition

Secret Key Generation Scheme Based on Deep Learning in FDD MIMO Systems

Multi-View Texture Learning for Face Super-Resolution

Attention Voting Network with Prior Distance Augmented Loss for 6DoF Pose Estimation

Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

Differentially Private Neural Networks with Bounded Activation Function

MTGAN: Extending Test Case set for Deep Learning Image Classifier

HAIF: A Hierarchical Attention-Based Model of Filtering Invalid Webpage

Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud

Deep Network for Parametric Bilinear Generalized Approximate Message Passing and Its Application in Compressive Sensing under Matrix Uncertainty

Backbone Alignment and Cascade Tiny Object Detecting Techniques for Dolphin Detection and Classification

Robustness of Deep Learning Models in Dermatological Evaluation: A Critical Assessment

Benchmarking Modern Edge Devices for AI Applications

Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles